300 research outputs found
Seamless Coarse Grained Parallelism Integration in Intensive Bioinformatics Workflows
To be easily constructed, shared and maintained, complex in silico bioinformatics analysis are structured as workflows. Furthermore, the growth of computational power and storage demand from this domain, requires workflows to be efficiently executed. However, workflow performances usually rely on the ability of the designer to extract potential parallelism. But atomic bioinformatics tasks do not often exhibit direct parallelism which may appears later in the workflow design process. In this paper, we propose a Model-Driven Architecture approach for capturing the complete design process of bioinformatics workflows. More precisely, two workflow models are specified: the first one, called design model, graphically captures a low throughput prototype. The second one, called execution model, specifies multiple levels of coarse grained parallelism. The execution model is automatically generated from the design model using annotation derived from the EDAM ontology. These annotations describe the data types connecting differents elementary tasks. The execution model can then be interpreted by a workflow engine and executed on hardware having intensive computation facility
Quality metrics for benchmarking sequences comparison tools
International audienceComparing sequences is a daily task in bioinformatics and many software try to fulfill this need by proposing fast execution times and accurate results. Introducing a new software in this field requires to compare it to recognized tools with the help of well defined metrics. A set of quality metrics is proposed that enables a systematic approach for comparing alignment tools. These metrics have been implemented in a dedicated software, allowing to produce textual and graphical benchmark artifacts
GASSST: global alignment short sequence search tool
Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-foldâachieving high performance with no restrictions on the number of indels with a design that is still effective on long reads
Parallelization of the K-means algorithm on a reconfigurable system: Application to hyper-spectral images
The article presents a parallel architecture dedicated to the k-means algorithm used for clustering objects in a non
hierachical way. We propose an implementation on a reconfigurable system made of a PC and a FPGA board closely
coupled through the I/O bus. Experimentations have been carried on a set of hyper-spectral images and show
that the computation time is reduced from a few hours to a few minutes. We also points out the influence of the
channel quality between the processor and the FPGA board over the global system performances.L'article prĂ©sente une architecture parallĂšle spĂ©cialisĂ©e pour l'algorithme du K-means, un algorithme de classification qui regroupe les objets de maniĂšre non hiĂ©rarchique. Nous proposons une mise en oeuvre sur un systĂšme reconfigurable composĂ© d'un PC couplĂ© Ă une carte FPGA par l'intermĂ©diaire de son bus d'entrĂ©es/sorties. L'expĂ©rimentation porte sur la segmentation d'images hyper-spectrales et dĂ©montre que les temps de traitement peuvent ĂȘtre rĂ©duits de quelques heures Ă quelques minutes. Cette rĂ©alisation met Ă©galement en Ă©vidence l'influence de la qualitĂ© de la liaison entre le processeur et la carte FPGA sur les performances globales du systĂšme
Un Réseau systolique intégré pour la correction de fautes de frappe
Ce rapport présente la réalisation d'un circuit VLSI spécialisé pour la correction de fautes de frappe. L'architecture du circuit est basée sur une structure réguliere, un réseau systolique bidimensionnel de 69 processeurs. La méthodologie suivie pendant la conception du circuit tire profit de cette regularité, notamment pendant les phases de validation
Multiple Comparative Metagenomics using Multiset k-mer Counting
Background. Large scale metagenomic projects aim to extract biodiversity
knowledge between different environmental conditions. Current methods for
comparing microbial communities face important limitations. Those based on
taxonomical or functional assignation rely on a small subset of the sequences
that can be associated to known organisms. On the other hand, de novo methods,
that compare the whole sets of sequences, either do not scale up on ambitious
metagenomic projects or do not provide precise and exhaustive results.
Methods. These limitations motivated the development of a new de novo
metagenomic comparative method, called Simka. This method computes a large
collection of standard ecological distances by replacing species counts by
k-mer counts. Simka scales-up today's metagenomic projects thanks to a new
parallel k-mer counting strategy on multiple datasets.
Results. Experiments on public Human Microbiome Project datasets demonstrate
that Simka captures the essential underlying biological structure. Simka was
able to compute in a few hours both qualitative and quantitative ecological
distances on hundreds of metagenomic samples (690 samples, 32 billions of
reads). We also demonstrate that analyzing metagenomes at the k-mer level is
highly correlated with extremely precise de novo comparison techniques which
rely on all-versus-all sequences alignment strategy or which are based on
taxonomic profiling
A Component model for synchronous VLSI system design
Disponible dans les fichiers attachés à ce documen
- âŠ